An Evaluation of Streaming Algorithms for Distinct Counting Over a Sliding Window
نویسندگان
چکیده
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring. On a stream of elements, it is commonly needed to compute an aggregate over only the most recent elements, leading to the problem of distinct counting over a “sliding window” of the stream. We present a detailed experimental study of the performance of different algorithms for distinct counting over a sliding window. We observe that the performance of an algorithm depends on the basic method used, as well as aspects such as the hash function, the mix of query and updates, and themethod used to boost accuracy. We compare the performance of prominent algorithms and evaluate the influence of these factors, leading to practical recommendations for implementation. To the best of our knowledge, this is the first detailed experimental study of distinct counting over a sliding window.
منابع مشابه
Submodular Maximization over Sliding Windows
In this paper we study the extraction of representative elements in the data stream model in the form of submodular maximization. Different from the previous work on streaming submodular maximization, we are interested only in the recent data, and study the maximization problem over sliding windows. We provide a general reduction from the sliding window model to the standard streaming model, an...
متن کاملEvaluation of recommender systems in streaming environments
Evaluation of recommender systems is typically done with finite datasets. This means that conventional evaluation methodologies are only applicable in offline experiments, where data and models are stationary. However, in real world systems, user feedback is continuously generated, at unpredictable rates. Given this setting, one important issue is how to evaluate algorithms in such a streaming ...
متن کاملDistinct Sampling on Streaming Data with Near-Duplicates
In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that all the near-duplicates are treated as the same element. We also extend the result to the sliding window cases in which we are only interested in the most rec...
متن کاملQuerying Sliding Windows Over Online Data Streams
A data stream is a real-time, continuous, ordered sequence of items generated by sources such as sensor networks, Internet traffic flow, credit card transaction logs, and on-line financial tickers. Processing continuous queries over data streams introduces a number of research problems, one of which concerns evaluating queries over sliding windows defined on the inputs. In this paper, we descri...
متن کاملQuerying languages over sliding windows
We study the space complexity of querying languages over data streams in the sliding window model. The algorithm has to answer at any point of time whether the content of the sliding window belongs to a fixed regular language. For regular languages, a trichotomy is shown: For every regular language the optimal space requirement is asymptotically either constant, logarithmic, or linear in the si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Front. ICT
دوره 2015 شماره
صفحات -
تاریخ انتشار 2015